Open Problem: Best Arm Identification: Almost Instance-Wise Optimality and the Gap Entropy Conjecture
نویسندگان
چکیده
The best arm identification problem (BEST-1-ARM) is the most basic pure exploration problem in stochastic multi-armed bandits. The problem has a long history and attracted significant attention for the last decade. However, we do not yet have a complete understanding of the optimal sample complexity of the problem: The state-of-the-art algorithms achieve a sample complexity of O( ∑n i=2 ∆ −2 i (ln δ −1 + ln ln ∆−1 i )) (∆i is the difference between the largest mean and the i mean), while the best known lower bound is Ω( ∑n i=2 ∆ −2 i ln δ −1) for general instances and Ω(∆−2 ln ln ∆−1) for the two-arm instances. We propose to study the instance-wise optimality for the BEST-1-ARM problem. Previous work has proved that it is impossible to have an instance optimal algorithm for the 2-arm problem. However, we conjecture that modulo the additive term Ω(∆−2 2 ln ln ∆ −1 2 ) (which is an upper bound and worst case lower bound for the 2-arm problem), there is an instance optimal algorithm for BEST-1-ARM. Moreover, we introduce a new quantity, called the gap entropy for a best-arm problem instance, and conjecture that it is the instance-wise lower bound. Hence, resolving this conjecture would provide a final answer to the old and basic problem.
منابع مشابه
Towards Instance Optimal Bounds for Best Arm Identification
In the classical best arm identification (Best-1-Arm) problem, we are given n stochastic bandit arms, each associated with a reward distribution with an unknown mean. Upon each play of an arm, we can get a reward sampled i.i.d. from its reward distribution. We would like to identify the arm with the largest mean with probability at least 1 − δ, using as few samples as possible. The problem has ...
متن کاملBest-Arm Identification in Linear Bandits
We study the best-arm identification problem in linear bandit, where the rewards of the arms depend linearly on an unknown parameter θ and the objective is to return the arm with the largest reward. We characterize the complexity of the problem and introduce sample allocation strategies that pull arms to identify the best arm with a fixed confidence, while minimizing the sample budget. In parti...
متن کاملLagrangian Relaxation Method for the Step fixed-charge Transportation Problem
In this paper, a step fixed charge transportation problem is developed where the products are sent from the sources to the destinations in existence of both unit and step fixed-charges. The proposed model determines the amount of products in the existing routes with the aim of minimizing the total cost (sum of unit and step fixed-charges) to satisfy the demand of each customer. As the problem i...
متن کاملNearly Instance Optimal Sample Complexity Bounds for Top-k Arm Selection
In the Best-k-Arm problem, we are given n stochastic bandit arms, each associated with an unknown reward distribution. We are required to identify the k arms with the largest means by taking as few samples as possible. In this paper, we make progress towards a complete characterization of the instance-wise sample complexity bounds for the Best-k-Arm problem. On the lower bound side, we obtain a...
متن کاملNearly Optimal Sampling Algorithms for Combinatorial Pure Exploration
We study the combinatorial pure exploration problem BEST-SET in a stochastic multi-armed bandit game. In an BEST-SET instance, we are given n stochastic arms with unknown reward distributions, as well as a family F of feasible subsets over the arms. Let the weight of an arm be the mean of its reward distribution. Our goal is to identify the feasible subset in F with the maximum total weight, us...
متن کامل